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Abstract 

When mathematicians present proofs they usually adapt their explanations to their didac- 
tic goals and to the (assumed) knowledge of their addressees. Modern automated theorem 
provers, in contrast, present proofs usually at a fixed level of detail (also called granularity). 
Often these presentations are neither intended nor suitable for human use. A challenge there- 
fore is to develop user- and goal-adaptive proof presentation techniques that obey common 
mathematical practice. We present a flexible and adaptive approach to proof presentation 
that exploits machine learning techniques to extract a model of the specific granularity of 
proof examples and employs this model for the automated generation of further proofs at an 
adapted level of granularity. 

Keywords: Adaptive proof presentation, proof tutoring, automated reasoning, machine learn- 
ing, granularity. 
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\T\ Let X be an element of A f] {B U C),\2\ then x £ A and x e B U C. ^ This means that x £ A, and 
either x£BoixGC.\4\ Hence we either have (i) x £ A and x € B,ot we have (ii) x £ A and x e C. [5] 
Therefore, either x £ AnB or x e AnC,so\6\x e {AnB)U{AnC).U} This shows that An{BuC) 
is a subset of (yl n -B) U n C). [8] Conversely, let y be an element of {An B)U {AnC).\9} Then, either 
(iii) y e AnB,OT (iv) y £ AnC.\TO\lt follows that y £ A, and either y £ B or y £ C. Therefore, 
y £ Aandy £ BUC so that y £ An{BuC).\\2\ Hence {AnB)U{An C) is a subset of ^ n (5 U C). 
[T3l In view of Definition 1. 1. 1, we conclude that the sets An{BuC) and n i?) U n C) are equal. 

Figure 1: Proof of the statement Ar\ {B U C) = {Ar\ B) U {Ar\ C), reproduced from [4] 



1 Introduction 



A key capability trained by students in mathematics and the formal sciences is the ability to con- 
duct rigorous arguments and proofs and to present them. Thereby, proof presentation is usually 
highly adaptive as didactic goals and (assumed) knowledge of the addressee are taken into consid- 
eration. Modem automated theorem proving systems, however, do often not sufficiently address 
this common mathematical practice. They typically generate and present proofs using very fine- 
grained and machine-oriented calculi. While some theorem proving systems exist — amongst 
them prominent interactive theorem provers — that provide means for human-oriented proof pre- 
sentations (e.g. proof presentation modules in Coq [17], Isabelle [18] and Theorema [19]), the 
challenge of supporting user- and goal-adapted proof presentations has been widely neglected 
in the past. This constitutes an unfortunate gap, in particular since mathematics and the formal 
sciences are increasingly targeted as promising application areas for intelligent tutoring systems. 
In this paper we present a flexible and adaptive approach to proof presentation that exploits ma- 
chine learning techniques to extract a model of the specific granularity of given proof examples, 
and that subsequently employs this model for the automated generation of further proofs at an 
adapted level of granularity. Our research has its roots in the collaborative DIALOG project [5] 
in which we developed means to employ the proof assistant f2MEGA [16] for the dialog-based 
teaching of mathematical proofs. In DIALOG we have considered a dynamic approach: Instead 
of guiding the student along a pre-defined path towards a solution, we support the dynamic explo- 
ration of proofs, using automated proof search. This presupposes the development of techniques 
to adequately model the proofs a student is supposed to learn. Inference steps in fiMEGA are 
implemented via an assertion application mechanism [8], which is based upon Serge Autexier's 
CoRe calculus [1] as its logical kernel. In assertion level proofs, all inference steps are justified 
by a mathematical fact, such as definitions, theorems and lemmas, but not by steps of a purely 
technical nature such as structural decompositions, as required, for example, in natural deduction 
or sequent calculi. 

The development of the dialog system prototype was guided by empirical studies using a 
mock-up of the Dialog system [6]. One research challenge that educed out of the experiments 
is the question of judging the appropriate step size of proof steps (in the context of tutoring), 
also referred to as the granularity of mathematical proofs. Even in introductory textbooks in 
mathematics, intermediate proof steps are skipped, when this seems appropriate. An example is 
the elementary proof in basic set theory reproduced in Figure 1. Whereas most of the proof steps 
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consist of the application of exactly one mathematical fact (in this case, a definition or a lemma, 
such as the distributivity of and over or), the step from assertion [9] to assertion [TO] suggests the 
application of several inference steps at once, namely the application of the definition of fl twice, 
and then using the distributivity of and over or. 

Similar observations were made in 

student: (x, y) £ (Ro sy^ 

tutor: Now try to draw inferences from that ! 



correct 



appropriate 



relevant 



student: {x,y) e o R-^ 

tutor: One cannot directly deduce that 



correct 



too coarse-grained relevant 



the empirical studies within the DIA- 
LOG project. In these studies the tutors 
who helped to simulate the dialog sys- 
tem identified limits for how many in- 
ference steps are to be allowed at once. 
An example for an inacceptably large 
student step that was rejected by the tu- 
tor is presented to the right. 

The idea to represent proofs at different levels of detail was incorporated into ^7MEGA as a 
hierarchically organized proof data structure [2]. The proof explanation system P.rex [9] im- 
plemented the idea to generate adapted proof presentations by moving up or down these layers 
on request. Alas, though the proofs at different levels of detail can be handled by the Qmega 
system, the problem remains of how to identify a particular level of granularity and how to en- 
sure that this level of granularity is appropriate. This observation also applies to the Edinburgh 
HiProofs system [7]. 

Autexier and Fiedler have proposed one particular level of granularity [3], which they call 
what-you-need-is-what-you-stated granularity. Based on the assertion level inference mecha- 
nism in f2MEGA, they also developed a proof checking mechanism for this level. In brief, their 
notion of granularity refers to such assertion level proofs, where all assertion level inference steps 
are spelled out explicitly and refer only to facts readily available from the assertions or the pre- 
vious inference steps. However, they conclude that even the simple proof in Figure 1 does not 
comply with their level of granularity, since the proof is missing some details. 

This paper presents in Section 2 an adaptive framework to model proof granularity. This 
framework has been implemented as an extension of the fiMEGA proof assistant and it is used to 
generate proof presentations at specific granularity levels of interest. In Section 3 we illustrate 
how our framework captures the granularity of our running example proof in Figure 1 . Models 
for granularity can be learned in our framework from samples, for which we employ standard 
machine learning techniques, as demonstrated in Section 4. 
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DefU (8) 
DEPn (7) 



es h cc es 



{xe{AnB)vxe{Anc))\- xes 
{xe{AnB) V xeAAxeC) \- xes 



DEpn (15) 



iyeAAyeiBuC))^yeT ^^^^ ^^^^ 

DEFn (6) y-^*^"^^' ^ '-^^^ {yeAA{yeBVyeC)}\-yeT: DisTR(13) 

DISTR(5) iyeAAyeBVyeAAyeC)\-yeT j^^p^ 

DEFU(-I) (xeAAjxeBV xeC))\- xes {yeAAyeBVye{AnC))\-yeT pgppQ^^ 

DEFn (3) (^e^A.e(BuC))h.es (^e (ahb) v ^ e (Anc)) h y et ^^^^^ 

pgp^(2) (a;e(An (BUC))) h a:eS (yeS)hyeT 



Def eq (1) 



h (An (BUC)) cs 



h ((A n B) u (A n C)) c T 



DefC (9) 



h (An(BuC)) = ((AnB)u(AnC)) 

s 



Figure 2: Assertion level proof for the statement An (B U C) = (An B) U {An C) 



2 An Adaptive Model for Granularity 

We treat the granularity problem as a classification task: given a proof step, representing one or 
several assertion applications, we judge it as either appropriate, too big or too small. As our 
feature space we employ several mathematical and logical aspects of proof steps, but also aspects 
of cognitive nature. For example, we keep track of the background knowledge of the user in a 
student model. 



We illustrate our approach with an example proof step in Figure 1 : [TO] is derived from [9] by 



applying the definition of fl twice, and then using the distributivity of and over or. In this step 
(which corresponds to multiple assertion level inference steps) we make the following observa- 
tions: 

(i) involved are two concepts: def. of fl and distributivity of and over or, 

(ii) the total number of assertion applications is three, 

(iii) all involved concepts have been previously applied in the proof, 

(iv) all manipulations apply to a common part in \9\, 

(v) the names of the applied concepts are not explicitly mentioned, and 

(vi) two of the assertion applications belong to naive set theory (def. of fl) and one of them 
relates to the domain of propositional logic (distributivity). 

These observations can be represented as a feature vector,' where, in our example, the feature 
"distinct concepts" receives a value of "2", and so forth. We express our models for classifying 
granularity as rule sets, which associate specific combinations of feature values to a corresponding 
granularity verdict ("appropriate", "too big" or "too small"). These rule sets may be hand- 
authored by an expert or they may be learned from empirical data as we show in Section 4. 
Our algorithm for granularity-adapted proof presentation takes two arguments, a granularity rule 
set and an assertion level proof ^ as generated by r2MEGA. Figure 2 shows the assertion level 



'Currently, we use around twenty features which are domain-independent, plus an indicator feature for each 
definition or lemma, and one indicator feature for each theory. 

■^Our approach is not restricted to assertion level proofs and is also applicable to other calculi. However, in 
mathematics education we consider single assertion level proof steps as the finest granularity level of interest. We 
gained evidence for this choice from the empirical investigations in the DIALOG project (cf. [5] and [6]). 
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proof generated by ^7MEGA for our running example; this proof is represented as a tree (or acyclic 
graph) in sequent-style notation and the proof steps are ordered. Currently we only consider plain 
assertion level proofs, and do not assume any prior hierarchical structure or choices between proof 
alternatives (as possible in fiMEGA). Our algorithm performs an incremental categorization of 
steps in the proof tree (where n = 0, . . . , k denotes the ordered proof steps in the tree; initially n 
is 1): 

while there exists a proof step n do 

evaluate the granularity of the compound proof step n (i.e., the proof step consisting 
of all assertion level inferences performed after the last step labeled "appropriate with 
explanation" or "appropriate without explanation" — or the beginning of the proof, if 
none exists yet) with the given rule set under each of the following two assumptions: 
(i) assuming that the involved concepts are mentioned in the presentation of the step 
(an explanation), and (ii) assuming that only the resulting formula is displayed. 

1. if n is appropriate with explanation 

then label n as "appropriate with explanation"; set n := n+1; 

2. if n is too small with explanation, but appropriate without explanation 
then label n as "appropriate without explanation"; set n := n+1; 

3. if n is too small both with and without explanation 
then label n as "too small"; set n := n+1; 

4. if n is too big 

then label n—1 as "appropriate without explanation" (i.e. consider the previous 
step as appropriate), unless n—1 is labeled "appropriate with explanation" or 
"appropriate without explanation" already or n is the first step in the proof (in 
this special case label n as "appropriate with explanation" and set n := n+1). 

od 

We thereby obtain a proof tree with labeled steps (or labeled nodes) which differentiates between 
those nodes that are categorized as appropriate for presentation and those which are considered 
too fine-grained. Proof presentations are generated by walking through the tree,^ skipping the 
steps labeled too small.^ 

When modeling granularity as a categorization problem, we have to test the hypothesis that 
the combination of features we devise is useful for the classification task. I.e., we have to de- 
termine whether steps within a class (i.e. "appropriate", "too big" and "too small") can indeed 
be fruitfully characterized by specific combinations of feature values, and distinguished from the 
feature values that characterize the two other classes. Our methodology for evaluation of this 
hypothesis consists in case studies and in empirical evaluations with mathematics tutors. This is 
exemplified in the following two sections. 



^In case of several branches, a choice is possible which subtree to present first, a question which we do not address 
in this paper. 

^Even though the intermediate steps which are too small are withheld, the presentation of the output step reflects 
the results of all intermittent assertion applications, since we include the names of all involved concepts whenever a 
(compound) step is appropriate with explanation. 
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9. 
10. 

11. 
12. 



In view of Definition 1.1.1, we [show] that the sets 

An{B UC) and {A n B) U {A (1 C) are equal. 

[T3] [First we show] that ^ n (B U C) is a subset of 

{Ar\B)\j{Ar\ C). [7] [Later we show] (A n U 

\a n C) is a subset of A n (B U C).[ll] 

Let X be an element of A n (B U C), [T] 

then X e A and a; e B U C. [H 

This means that x ^ A, and either x G i? or x 6 

C.S 

Hence we either have (i) x G ^4 and x G i?, or we 

have (ii) x G A and x G C.[4] 

Therefore, either xGAnSorxGAn C[5] 

soxG (AnB)u(AnC).[6] 

Conversely, let y be an element of (A n B) U (A n 

Then, either (iii) y G A n B, or (iv) j/ G A n C.[9] 
It follows that 2/ G A, and either y G i? or y G C. 

Eo] 

Therefore, y G A and y G B U C, [TT] 
sothaty G An(BUC).[TT] 
(a) 



1. We show that ((ylnB)U(AnC) C .4nBUC) 
and (A n B U C C (A n B) U (A n C)) 
...because of definition of equality 

2. We assume x G A n B U C and show x G 

(A n B) u (A n C) 

3. Therefore, xgAAxgSUC 

4. Therefore, xGAA(xGi?VxGC) 

5. Therefore, (xG^ A xeB) V (xG^ A xgC) 

6. Therefore, xGAnBVxGAnC 

7. We are done with the current part of the proof 
(i.e., to show that X G (ylnS)U(ylnC)). [It 
remains to be shown that (AnB)U(AnC) C 
Ar\B\JC] 

8. We assume y G (An -B) U (An C) and show 
y e Ar\B\JC 

9. Therefore, y ^ An B \J y ^ Af] C 

10. Therefore, y G A A (y G B V y G C) 

11. Therefore, yGAAyGBUC 

12. This finishes the proof. Q.e.d. 

(b) 



Figure 3: Comparison between (a) the (re-ordered) proof by Bartle and Sherbert [4] and (b) the 
proof presentation generated with our rule set from the Qmega proof in Figure 2 



1) hypintro=l A total> 1 => step-too- 
big 

2) U-DefnG{l,2}An-DefnG{l,2} ^ 
step-too-big 

3) n-Defn< 3 A U-Defn=0 A mastered- 
conceptsunique=l A unmasteredcon- 
ceptsunique=0 step-too-small 

4) total <2 A verb=true step-too- 
small 

5) masteredconceptsunique<3 A 
unmasteredconceptsunique=0 A 
verb=true => step-too-small 

6) equalitydefn>0 A verb=false => 
step-too-big 

?) _==> step-appropriate 

(a) 

Figure 4: Rule sets employed in the 
set generated using C5.0 (ordered by 



1) conceptsunique g{0, 1} A equality defn=0 A verb=true 
step-too-small 

2) hypintro=0 A equalitydefn=0 A U-Defn=0 A verb=true =^ 
step-too-small 

3) conceptsunique g{2, 3, 4} A U-Defn g{1, 2, 3} ^ step-too- 
big 

4) hypintro g{1, 2, 3, 4} A conceptsunique g{2, 3, 4} =^ step- 
too-big 

5) unmasteredconceptsunique=0 A total g{0,1,2} n-Defn 
g{1, 2} A close=false step-too-small 

6) equality defn g{1, 2} Averb=false ^ step-too-big 

7) equalitydefnG{l, 2} A verb=true =J> step-appropriate 

8) equalitydefn=0 A verb=false =^ step-appropriate 

9) _ => step-appropriate 

(b) 

running example: (a) rule set generated by hand, (b) rule 
the rules' confidence values) 
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3 Case Study 



In this section, we exemplarily model the step size of the textbook proof in Figure 1 . Starting 
point for the automated generation of our proof presentations are assertion level proofs in the 
mathematics assistance system i^MEGA. The basic assertion level proof, assuming the basic 
definitions in naive set theory, is presented in Figure 2 as a sequent style proof tree. 

This proof consists of fifteen assertion level inference applications, which refer to the defini- 
tions of equality, subset, union and intersection as well as the concept of distributivity. Notice 
that the proof in Figure 1 (taken from the textbook Bartle & Sherbert [4]) starts (in statement [T]) 
with the assumption that an element x is in the set A (1 {B U C). The intention is to show the 
subset relation An (B U C) C (ACi B) U (An C). However, this is not explicitly revealed until 
step [6], when this part of the proof is already finished. The same style of delayed justification 
for prior steps is employed towards the end of the proof, where statements [12] and [13] justify 
(or recapitulate) the preceding proof. It must be questioned whether this style of presentation, 
where the motivation for some of the steps (such as the above assumption) is only presented in 
retrospective (when the assumption is discharged), is still the most effective one for instructing 
students in our times. This style originated in former centuries, when the general task of the 
apprentice was to figure out the reason behind the procedures of his technically highly competent 
master with often poor teaching skills. 

Thus, for the modeling of step size, we consider a re-ordered variant of the steps in Figure 1, 
which is displayed in Figure 3 (a).^ We now generate a proof presentation which matches the 
step size of the twelve steps in the original proof, skipping intermediate proof steps according to 
our feature-based granularity model. Figure 4 shows two sample rule sets which both lead to the 
proof presentation in Figure 3 (b). The rule set in Figure 4 (a) was generated by hand, whereas 
the rule set in Figure 4 (b) was generated with the help of the C5.0 data mining tool [15].^ 

The feature hypintro indicates whether a (multi-inference) proof step introduces a new hy- 
pothesis, and close indicates whether a branch of the proof has been finished. The feature total 
counts the number of assertion level inferences within one (multi-inference) step. Furthermore, 
the features masteredconceptsunique and unmasteredconceptsunique indicate how many of the 
employed concepts (if any) are supposed to be mastered or unmastered by the user according to a 
very basic student model (which is updated in the course of the proof). Furthermore, the occur- 
rences of particular defined notions are counted (via the features fl-Defn, U-Defn, equality defn). 
For example, the first rule in Figure 4 (a) can be interpreted as "If a step introduces a new hypoth- 
esis into the proof, and consists of more than one assertion level inference rule, it is considered 
too big." Note that rules 4-6 in Figure 4 (a) express the relation between the appropriateness of 
steps and whether the employed concepts are mentioned verbally (feature verb). Rule 6 has the 
effect of enforcing that the use of the definition of equality is always explicitly mentioned (as in 



^Note that step (1) in the re-ordered proof corresponds to the statements [7], HZ] and [Ts] in the original proof 
which jointly apply the concept of set equality. 

^The sample proof was used to fit the rule set to it. All steps in the sample proof were provided as appropriate, 
all intermediate assertion level steps were labeled as too-small, and always the next bigger step to each step in the 
original proof was provided as an example for a too big step. Care was taken that the default rule of the generated 
rule set is of class appropriate (which was achieved via the cost function), so that the rule set better transfers to other 
domains. Otherwise, in case the default class was too small, and the examined proof steps were sufficiently different 
from the generating sample (and thus failed to match the non-default rules), the resulting proof presentation would 
be excessively short. 
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step 1. in Fig 3 (b)). All other cases, which are not covered by the previous rules, are subject to 
a default rule. Rules are ordered by utility for conflict resolution. 

The generated proof presentation in Figure 3 (b) consists, similarly to the proof in Figure 3 
(a), of twelve steps. The three assertion level steps (11), (12) and (13) are combined into one 
single step from (9) to (10) in Figure 3 (b). Natural language is produced via simple patterns. (A 
more exciting natural language generation is possible with Fiedler's mechanisms [9], but this is 
not the subject of this paper.) 

The rule sets in Figure 4 can be successfully reused for other examples in the domains as well. 
In Figure 5, we present the resulting proof presentation when applying the rule set in Figure 4 (a) 
to a different proof exercise, namely a proof of the theorem 

{AnB)\C = An{B\C). 



1. We show that {{A n B)\C CAn B\C) and {A n B\C C{An B)\C) ...because of definition of equality 

2. We assume x e An B\C and show x e {An B)\C 

3. Therefore, x e AAx £ B\C 

4. Therefore, xeAAxeSA ^(a; G C) 

5. We are done with the current part of the proof (i.e., to show that x E {An B)\C). It remains to be shown 
that {A n B\C CAn B\C. 

6. We assume y e {An B)\C and show y e An B\G 

7. Therefore, y£A/\y£BA ^{y G C) similarly to steps nr (3 4) 

8. This finishes the proof. Q.E.D. ... similarly to step nr. 7 

Figure 5: Sample proof presentation generated via the rule set in Figure 4 (a) for the theorem 

{AnB)\C = An{B\C) 



PART decision list 

total <= 2 AND total > AND parapos <= 0: appropriate (85.0/4.0) 

total <= 2 AND unmasteredconceptsunique <= 0: step-too-small (11.0/2.0) 

parapos <= AND samesub <= 0: step-too-big (22.0/5.0) 

unmasteredconceptsunique <= 1 AND hypintro <= 0: appropriate (9.0) 
: step-too-big (8.0/2.0) 

Figure 6: Empirically learned rule set. The feature parapos indicates whether an inference 
has been applied only once in a proof situation where it could have been applied twice, in the 
same direction. The feature samesub indicates whether all inference applications within a (multi- 
inference) step apply to the same formula (and the same subparts thereof). 
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4 Learning from Empirical Data 

Classification problems are a well-investigated topic in the machine learning community. There 
exist off-the-shelf tools that allow to learn classifiers (like our rule sets) from annotated examples 
{supervised learning). In our case, an expert annotates proof steps with the labels appropriate, too 
small or too big. Representing the proof steps in fiMEGA has the advantage that all the features 
of a particular proof step are computed in the background, and combined automatically with the 
expert's judgments as training instances for the learning algorithm. Currently, our algorithm calls 
the C5.0 data mining tools [15, 14] — which support the learning of decision trees and of rule 
sets — to obtain classifiers for granularity. 

As part of an ongoing evaluation, we have conducted a study where a mathematician (with 
tutoring experience) judged the granularity of 135 proof steps. These steps were presented to him 
via an fiMEGA-assisted environment which computed the feature values for granularity classifi- 
cation in the background. The step size of proof steps presented to the expert was randomized, 
such that each presented step corresponded to one, two, or three assertion level inference steps. 
The presented proofs belonged to one exercise in naive set theory and three different exercises 
about relations. We evaluated rule learning using C5.0 on our sample using 10 fold cross valida- 
tion, which resulted in a mean percentage of correct classification of 84.6%, and k = 0.62. We 
also used the PART classifier [10] included in the Weka suite^, which is inspired by Quinlan's 
C4.5. After we excluded some of the attributes (in particular those that refer to the use of specific 
concepts, i.e., Def. of fl, Def. of o, etc.), R\RT achieved 86.7% of correctly classified instances in 
stratified cross validation (k=0.68). Apparently, removal of the most domain- specific attributes 
prevented the algorithm from overfitting. The resulting rule set is presented in Figure 6. 

The feature parapos indicates whether an inference has been applied only once in a proof 
situation where it could have been applied twice, in the same direction. The feature samesub 
indicates whether all inference applications within a (multi-inference) step apply to the same 
formula (and the same subparts thereof). When applied to our running example, we obtain the 
proof presentation as shown in Figure 7. 

1. We show that {{A C^ B) y {A r\ C) C A n B V C) and (A n B V C C (A n B) V (A n C)) ...because of 
definition of equahty 

2. We assume a; G A n B V C and show x G (A n B) V (A n C) ...because of definition of subset 

3. Therefore, i'GAAi'GBVC ...because of definition of intersection 

4. Therefore, x £ A /\ [x ^ B W x £ C) ...because of definition of union 

5. Therefore, xeAAxeB\/x£AAxeC ...because of logics 

6. Therefore, a;eAnBVxG^nC ...because of definition of intersection ... similarly to step nr 3 

7. We are done with the current part of the proof (i.e., to show that a; G fl B) V (A n C)). It remains to be 
shown that n B) V (A n C) C ^ n B V C. ... because of definition of union. 

8. We assume y G (A n B) V (A n C) and show ?/ G A n B V C ...because of definition of subset 

9. Therefore, j/GAnBVi/GAnC ...because of definition of union 

10. Therefore, yEAAyGBVyEAAyEC ...because of definition of intersection ... similarly to step nr. 3 

11. Therefore, y E A A {y E B V y G C) ...because of logics 

12. Therefore, y £ A Ay E B W C ...because of definition of union 

13. This finishes the proof. Q.e.d. ...because of definition of intersection 

Figure 7: The assertion level proof in Figure 2 presented according to the rule set from Figure 6 



''http : / / www . cs . waikato . ac . nz/ -ml /weka/ 
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To compare the rule-based classifiers with support vector machines, we applied SMO [13] 
on our data, resulting in 83.0% correctness and k=0.51 in stratified cross validation, which is a 
similar performance to C5.0. 



11 



5 Conclusion 

Granularity has been a challenge in AI for decades [11, 12]. Here we have focused on adap- 
tive proof granularity, which we treat as a classification problem. We model different levels of 
granularity using rule sets, which can be hand coded or learned from sample proofs. 

As a case study, we have formulated the granularity level of the proof in Figure 1 from the 
textbook [4] as a rule set in our classification-based approach. Classifiers are applied dynamically 
to each proof step, thus taking into account changeable information such as the user's familiarity 
with the involved concepts. Using assertion level proofs as the basis for our approach has the 
additional advantage that the relevant information for the classification task (e.g., the concept 
names) is easily read off the proofs. This also eases the generation of natural language proof 
output in general. 

Future work consists in empirical evaluations of the learning approach — to address the fol- 
lowing questions: 

(i) what are the most useful features for judging granularity, and are they different among 
distinct experts and domains, 

(ii) what is the interrater reliability among different experts and the corresponding classifiers 
generated by learning in our framework? 

The resulting corpora of annotated proof steps and generated classifiers can then be used to eval- 
uate the appropriateness of the proof presentations generated by our system. 
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